Chore/harden perspicacite#11
Open
lfnothias wants to merge 29 commits into
Open
Conversation
…generate_report generate_report returned LLM-synthesized `report` text alongside factual `sources`/`papers_used` with no marker distinguishing model-authored prose from verified metadata, and the synthesis prompts treated retrieved chunks as trusted canon — so a poisoned preprint's embedded instructions could steer the report (addresses #1). - Add a non-breaking `_provenance` envelope to both report payloads: provider, model, rag_cycles_executed, ai_generated_fields=["report"], and untrusted_sources (all retrieved sources are attacker-influenceable). - Wrap retrieved chunk bodies in [UNTRUSTED_DOCUMENT] markers (citation headers stay outside) and append a clause to the system/mandatory prompts instructing the model to treat that content strictly as data, never as instructions. The verbatim v1 MANDATORY_PROMPT_TEMPLATE is left untouched; the `report` field is not renamed (would break ASB/Scriptorium/Mimosa consumers). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
claims_to_graph emitted every qualifier on asb:qualifier, but that slot is a closed Bucur enum in indicium — domain-adapter qualifiers (e.g. aligned_with) failed SHACL. Route by membership: core Bucur terms stay on asb:qualifier; domain terms go on the open asb:domainQualifier slot added in indicium 1.7.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Tags each claim with verify_quote status, emits a JSONL audit sidecar, and applies strict (drop unverified) vs fail-open (keep tagged) policy. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nt path Addresses B1 review: guard the candidates comprehension against non-dict passages (the one unguarded line that could raise), document in-place mutation and the strict-ignored-when-verifier-unavailable behavior, and add a test that simulates the indicia extra being absent (degrades all to unverified, kept). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the positional chunk_idx bug (builder.py) with verify_quote's matched_index; writes quote_exact + oa:TextQuoteSelector on verified/repaired evidence and only an anchorStatus marker on unverified (no false selector). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The test bound a graph for "kbAnchor" without redirecting the on-disk manifest, so after the first run the persisted manifest marked the paper unchanged and subsequent runs skipped extraction (empty graph -> assertion failed). Monkeypatch manifest._DATA_DIR to tmp_path, matching every other build_claim_graph test. Also sort the new imports and drop the now-unused enumerate index left by the positional->content-match rewrite. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ig to builder annotate_anchor_status tags each extracted record (verified/repaired/ unverified/unchecked) before the license rewrite; MCP build_claim_graph now passes config.anchor.* (strict/near_threshold/audit_dir) into the builder. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The build_claim_graph MCP tool now reads config.anchor.* (R3), so its unit test must give the fake MCPState a real Config (was None). Also moves the type-only pathlib.Path import under TYPE_CHECKING and drops a stale noqa in anchor.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drives build_claim_graph with a fabricated quote (verbatim in no passage) and asserts the claim is kept (fail-open) but tagged unverified with zero quoteExact / oa:TextQuoteSelector triples — pinning the no-laundering invariant at the builder layer where the selector logic lives. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire the MCP claim-extraction tool through the R3 anchor invariant: after
extract_claims, run anchor_claims so every returned claim carries an _anchor
record {status, matched_index, quote_exact, score}. quote_exact is populated
only for verified/repaired claims — never fabricated for an unverified quote
(no-laundering holds at the tool layer, not just the builder).
Also fix a latent blank-passage bug: get_relevant_passages emits flat
{text, source_doi}, but extract_claims and anchor_claims both read
{chunk_text, source:{doi}}. Normalize once up front so the model sees real
passage text and the verifier gets real candidates — otherwise anchoring would
silently see empty text and tag everything unverified.
Pin indicium>=1.8.0 (the floor that ships the verify_quote kernel the anchor
path imports) and add two unit tests: a verbatim quote anchors verified with
quote_exact recovered; a fabricated quote is kept but tagged unverified with
quote_exact=None.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The retry predicate now requires isinstance(e, Exception) so that asyncio.CancelledError (a BaseException, which tenacity otherwise catches) propagates instead of being retried. Without this, every caller-side asyncio.wait_for/timeout was ineffective — all 3 retry attempts ran their full HTTP timeout before the stale deadline was noticed, so the structured extraction tools blew past their 80s cap (observed 115-151s). The extract_failure_modes / extract_parameters calls now return within the cap. route_kbs gains a kb_names alias for candidate_kbs: every other multi-KB tool names that argument kb_names, so agents reach for it first and the mismatch previously errored the first call. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Flip the CI ruff gate from report-only (--exit-zero) to enforcing, pin ruff to 0.15.13, and align CI + mypy to Python 3.12 (matches requires-python). Add .pre-commit-config.yaml running the same pinned ruff (lint + format-on-touched-files only, per the pyproject note). Resolve all outstanding ruff findings via the established fix-vs-ignore policy: fix real issues (B904 exception chaining, F401/F841 dead code, RUF012 ClassVar, E731, UP035, SIM105, B905 zip strict=) and config-ignore genuine false positives with justifications (E402, UP042, SIM102, SIM117, TC001-3, RUF001-3, N806/N814, B008, plus test-idiom per-file ignores). Also fix two regressions from over-eager autofixes surfaced by the suite: - jobs/registry.py: SIM118 had rewritten `for k in row.keys()` on an aiosqlite.Row (iterating a Row yields values, not keys) -> IndexError. Restored .keys() with a justified noqa. - cite_graph.py: restored openalex_id_for_doi (a test monkeypatch target) with a justified noqa: F401. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adopt a pragmatic-but-enforced mypy config: semantic bug-catchers stay on (attr-defined, arg-type, return-value, index, assignment, ...) while annotation-completeness rules stay off to avoid churn in a research codebase. Drove all findings to zero across src/, fixing several real latent bugs surfaced along the way: StreamEvent payloads passed a dict where a str was required (now json.dumps), a wrong base64.binascii.Error reference, non-indexable Iterable[Row] from execute_fetchall, a referenced-but-deleted except variable, and a missing DOI None-guard. Added `mypy src/` to the CI lint job so the gate holds going forward. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Audit of bare `except Exception` swallows: narrow defensive parse/nav handlers to precise exception types (KeyError/TypeError/AttributeError, ValueError, json.JSONDecodeError, ImportError) so unexpected errors propagate instead of being silently eaten; keep best-effort side-effect handlers broad but add logger.debug observability. Behavior-preserving: identical control flow, no new suppression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move the seven implementation helpers (_start_mcp_and_web, _run_query, the bibtex KB builders, _build_app_state_for_cli, and the ingest summary printers) out of the 2000-line cli.py into a dedicated cli_helpers module, re-importing them so perspicacite.cli.<name> stays a valid monkeypatch target for the existing unit tests. First slice of the god-module split; the larger server.py/orchestrator splits remain a dedicated follow-up. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reconciles the hardening work (ruff/mypy CI gate + cli_helpers decomposition) with what landed on main after the branch was cut: the R3 anchor invariant (PR #10) and the claims markdown-fence fix (PR #9). The anchor logic was already present on this branch via its feat/domain-qualifier-routing base, so it reconciles cleanly; PR #9's _strip_markdown_fence is the net new content. Conflicts (both resolved keeping BOTH anchor logic and lint cleanups): - src/perspicacite/indicium_layer/builder.py: dropped unused INDICIUM_NS import (kept the W1 F401 cleanup; anchor code unaffected). - tests/unit/test_claims_validation.py: kept the isort blank line separating third-party from first-party imports. Verified on the merged tree: ruff clean, mypy clean (241 files), 2487 passed / 1 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CI's lint and test jobs install base deps only (no `indicia` extra), so rdflib/pyoxigraph/indicium are absent there while present locally. Two pre-existing failures surfaced after the main merge: - mypy lint: the ignore_missing_imports override omitted rdflib/pyoxigraph, and store.py aliased `ConjunctiveGraph = Dataset` — a local assignment that is invalid as a type annotation. Add both modules to the override list, drop the alias (use Dataset directly), and disable warn_unused_ignores for indicium_layer.store where the rdflib union-attr ignore is env-dependent. - smoke test: _TOOL_ARGS lacked entries for 8 web/skill/claim-graph tools, so test_tool_inventory called them with no args and hit TypeErrors. Supply minimal valid args; claim-graph tools short-circuit cleanly at `import indicium` in CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…etry test CI installs only ".[dev]" via plain pip, which cannot resolve the local-path `indicium` source, so indicium/rdflib/pyoxigraph/indicium_adapters are never present in CI. Guard every claim-graph / verify-kernel test with pytest.importorskip so they skip cleanly there while still running locally under `uv sync --extra indicia`. Mixed modules use per-test guards to preserve their non-indicia coverage rather than skipping the whole module. Also deselect test_domain_aggregator::test_retry_counts_as_one_circuit_failure from the safe subset: it exercises real exponential-backoff retry sleeps (~27s) and exceeds the 15s safe-subset timeout. It passes under a longer timeout and is unrelated to the indicia extra. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…on links Move the 18 root-level config files into a tidy config/ tree to declutter the repo root: - config/example.yml (was config.example.yml) - config/providers/*.yml (6 provider presets, was config.*.example.yml) - config/embeddings/*.yml (11 model presets, was config_*.yml) Names are cleaned (drop config./config_ prefix and .example infix, underscores -> hyphens). All references updated across docs, README, help/error strings (agent_cli, claude_cli, check_install), and tests. The config-audit test glob now points at config/providers + config/example. Also add proper citation links to the README (CEUR-WS PDF + HAL archive) in both the human-readable block and the BibTeX entry, and extend .gitignore with tooling caches (.mypy_cache, .pytest_cache, .coverage) and local claim-graph stores. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- CI: enable actions/setup-python pip caching (keyed on pyproject.toml) for both the lint and test jobs to speed up dependency installs. - Add .github/dependabot.yml: weekly grouped updates for pip and github-actions ecosystems. - pyproject.toml: add readme, authors, keywords, trove classifiers, and [project.urls] (Homepage/Repository/Issues/Changelog/Paper) for a release-quality package description. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Update inbound references to the relocated root files (INSTALL_AGENT.md, MANUAL_QA.md) in setup.md, docs/development/testing.md, and docs/index.md, and drop the now-stale root allowlist entries from .gitignore (the files live under the already-allowlisted docs/ tree). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Watch frontend/package-lock.json (grouped, weekly) so the frontend's npm dependencies get the same automated update treatment as the Python deps. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire a third CI job for the Next.js frontend running lint, typecheck, unit tests, and build with npm caching. Add eslint flat config (Next 16 core-web-vitals; React-Compiler-era react-hooks rules set to warn for a green baseline) and a vitest suite covering the cost estimation helpers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remove the blanket warn-downgrade of the React-Compiler react-hooks rules and address all 21 findings: genuinely unnecessary effects are refactored (redundant setState removed; reset-on-change moved to render-time guards; ChatPanel's cancel callback reordered above its consuming effect), and the remaining legitimate effect patterns (async mount fetches, SSR-safe localStorage hydration, intentional latches) carry per-line justified disables. The four rules are back at "error" so future violations fail CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add a log line to broad exception handlers that previously swallowed errors silently, so failures in optional-dep fallbacks, JSON parsing, download workers, and background jobs surface in logs. Behavior is preserved: control flow, return values, and re-raises are unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…chestrator Completes the silent-handler audit for the two modules deferred from the first pass. Each previously-silent broad except now logs one line (debug for best-effort/fallback paths, warning where user-meaningful work is dropped) before its existing fallback. Behavior is unchanged: control flow, return values, and caught types are preserved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Checklist
uv run pytestuv run ruff check src/ tests/uv run mypy src/Test plan
Related issues